Convergence of Value Aggregation for Imitation Learning

نویسندگان

  • Ching-An Cheng
  • Byron Boots
چکیده

Value aggregation is a general framework for solving imitation learning problems. Based on the idea of data aggregation, it generates a policy sequence by iteratively interleaving policy optimization and evaluation in an online learning setting. While the existence of a good policy in the policy sequence can be guaranteed non-asymptotically, little is known about the convergence of the sequence or the performance of the last policy. In this paper, we debunk the common belief that value aggregation always produces a convergent policy sequence with improving performance. Moreover, we identify a critical stability condition for convergence and provide a tight non-asymptotic bound on the performance of the last policy. These new theoretical insights let us stabilize problems with regularization, which removes the inconvenient process of identifying the best policy in the policy sequence in stochastic problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Reinforcement Learning through Implicit Imitation

Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent’s ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in ce...

متن کامل

Smooth Imitation Learning for Online Sequence Prediction

We study the problem of smooth imitation learning for online sequence prediction, where the goal is to train a policy that can smoothly imitate demonstrated behavior in a dynamic and continuous environment in response to online, sequential context input. Since the mapping from context to behavior is often complex, we take a learning reduction approach to reduce smooth imitation learning to a re...

متن کامل

Learning to Play Approximate Nash Equilibria in Games with Many Players

We illustrate one way in which a population of boundedly rational individuals can learn to play an approximate Nash equilibrium. Players are assumed to make strategy choices using a combination of imitation and innovation. We begin by looking at an imitation dynamic and provide conditions under which play evolves to an imitation equilibrium; convergence is conditional on the network of social i...

متن کامل

Model-Free Imitation Learning with Policy Optimization

In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly d...

متن کامل

How to Choose the Bidding Strategy in Continuous Double Auctions: Imitation Versus Take-The-Best Heuristics

Human-subject market experiments have established in a wide variety of environments that the Continuous Double Auction (CDA) guarantees the maximum efficiency (100 percent) and the transaction prices converge quickly to the competitive equilibrium price. Since in humansubject experiments we can not control the agents' behaviour, one would like to know if these properties (quick price convergenc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.07292  شماره 

صفحات  -

تاریخ انتشار 2017